Regular Expression Matching Can Be Simple And Fast is worth a look. re::engine::Plan9 should let you test this approach.
Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.
We're indeed talking about the regular expression/a?a?a?aaa/
which no experienced Perl code would ever use in real code.
Rewrite it, for example, as
and try again. (Caveat: I haven't benchmarked it myself.)/(?:a(?:a(?:a)?)?)?aaa/
The problem isn't as much of a problem as these guys want to make you believe.
Re:It's a very nice article
Aristotle on 2008-01-14T20:16:50
Rewrite it, for example, as
/(?:a(?:a(?:a)?)?)?aaa/ Err, come again? Surely you mean “
/a{3,6}/
”?Re:It's a very nice article
bart on 2008-01-15T20:19:38
Even better.:) I'm just saying that every pathetic regexp can most likely be rewritten into something that is not pathetic witout much trouble — and possibly even be optimized.
I thought the article was important enough to post but I have to confess, I've never run into the problems shown in the article. I've even written my own (simple) regex matcher in C without running into those problems. Maybe this is an example of Jamie Zawinski's
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
or Abraham Maslow's:
If you only have a hammer then you treat everything like a nail.
Sometimes, you really don't want to use regexes...